深度学习已在许多神经影像应用中有效。但是,在许多情况下,捕获与小血管疾病有关的信息的成像序列的数量不足以支持数据驱动的技术。此外,基于队列的研究可能并不总是具有用于准确病变检测的最佳或必需成像序列。因此,有必要确定哪些成像序列对于准确检测至关重要。在这项研究中,我们旨在找到磁共振成像(MRI)序列的最佳组合,以深入基于学习的肿瘤周围空间(EPV)。为此,我们实施了一个有效的轻巧U-NET,适用于EPVS检测,并全面研究了来自易感加权成像(SWI),流体侵入的反转恢复(FLAIR),T1加权(T1W)和T2的不同信息组合 - 加权(T2W)MRI序列。我们得出的结论是,T2W MRI对于准确的EPV检测最为重要,并且在深神经网络中掺入SWI,FLAIR和T1W MRI可能会使精度的提高无关。
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique challenges and is yet underexplored for text summarization systems. In this shared task, we introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts. We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions. As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total. We discuss the submissions and the baselines for each sub-task in this paper, along with directions for facilitating future work in the field.
translated by 谷歌翻译
通常通过过去的选择来告知机器学习中的评估,例如要使用哪些数据集或指标。该标准化可以使用排行榜对平等基础进行比较,但是随着出现更好的替代方案,评估选择变得不佳。这个问题在自然语言生成中尤其相关,该语言需要不断改善的数据集,指标和人类评估以提出确定性的主张。为了使遵循最佳模型评估实践更加容易,我们介绍了GEMV2。新版本的一代,评估和指标基准为数据集,模型和指标开发人员提供了模块化基础架构,以使彼此受益。GEMV2支持40种记录的数据集中51种语言。所有数据集的模型都可以在线评估,我们的交互式数据卡创建和渲染工具使得在Living Benchmark中添加新数据集变得更加容易。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
制定了具有机器学习模拟(骆驼)项目的宇宙学和天体物理学,通过数千名宇宙的流体动力模拟和机器学习将宇宙学与天体物理学结合起来。骆驼包含4,233个宇宙学仿真,2,049个n-body和2,184个最先进的流体动力模拟,在参数空间中采样巨大的体积。在本文中,我们介绍了骆驼公共数据发布,描述了骆驼模拟的特性和由它们产生的各种数据产品,包括光环,次麦,银河系和空隙目录,功率谱,Bispectra,Lyman - $ \ Alpha $光谱,概率分布函数,光环径向轮廓和X射线光子列表。我们还释放了超过骆驼 - 山姆的数十亿个星系的目录:与Santa Cruz半分析模型相结合的大量N身体模拟。我们释放包含350多个Terabytes的所有数据,并包含143,922个快照,数百万光环,星系和摘要统计数据。我们提供有关如何访问,下载,读取和处理数据AT \ URL {https://camels.readthedocs.io}的进一步技术详细信息。
translated by 谷歌翻译
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
translated by 谷歌翻译
心脏磁共振(CMR)图像的多类分割,将数据分离为具有已知结构和构型的解剖组分。最流行的基于CNN的方法是使用像素明智的损失函数优化的,对表征解剖结构的空间扩展特征一无所知。因此,尽管与地面真理共享高空间重叠,但推断的基于CNN的分割可能缺乏连贯性,包括伪造的连接组件,孔和空隙。这样的结果令人难以置信,违反了预期的解剖拓扑。作为响应,已经提出了基于持续的同源性损失功能(单级)以捕获全局解剖特征。我们的工作将这些方法扩展到多级分割任务。我们的损失功能构建了所有类标签和类标签对的丰富拓扑描述,我们使用基于CNN的后处理框架可以预测和统计学上的分割拓扑改进。我们还基于立方复合物和并行执行,提出(并提供)高效的实现,这是第一次在高分辨率3D数据中实现实际应用。我们证明了我们在2D短轴和3D全心CMR细分方面的方法,对两个公开可用数据集的性能进行了详细而忠实的分析。
translated by 谷歌翻译
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译